OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises adatabase of all EC FP7 and H2020 funded research projects, including metadataof their results (publications and datasets). These data are stored in an HBaseNoSQL database, post-processed, and exposed as HTML for human consumption, andas XML through a web service interface. As an intermediate format to facilitatestatistical computations, CSV is generated internally. To interlink theOpenAIRE data with related data on the Web, we aim at exporting them as LinkedOpen Data (LOD). The LOD export is required to integrate into the overall dataprocessing workflow, where derived data are regenerated from the base dataevery day. We thus faced the challenge of identifying the best-performingconversion approach.We evaluated the performances of creating LOD by aMapReduce job on top of HBase, by mapping the intermediate CSV files, and bymapping the XML output.
展开▼